93 research outputs found

    Continuous Strategy Replicator Dynamics for Multi--Agent Learning

    Full text link
    The problem of multi-agent learning and adaptation has attracted a great deal of attention in recent years. It has been suggested that the dynamics of multi agent learning can be studied using replicator equations from population biology. Most existing studies so far have been limited to discrete strategy spaces with a small number of available actions. In many cases, however, the choices available to agents are better characterized by continuous spectra. This paper suggests a generalization of the replicator framework that allows to study the adaptive dynamics of Q-learning agents with continuous strategy spaces. Instead of probability vectors, agents strategies are now characterized by probability measures over continuous variables. As a result, the ordinary differential equations for the discrete case are replaced by a system of coupled integral--differential replicator equations that describe the mutual evolution of individual agent strategies. We derive a set of functional equations describing the steady state of the replicator dynamics, examine their solutions for several two-player games, and confirm our analytical results using simulations.Comment: 12 pages, 15 figures, accepted for publication in JAAMA

    A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

    Full text link
    Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at https://unnat.github.io/cordial-sync .Comment: Accepted to ECCV 2020 (spotlight); Project page: https://unnat.github.io/cordial-syn

    Approximate policy iteration: A survey and some new methods

    Get PDF
    We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced policy iteration, policy oscillation and chattering, and optimistic and distributed policy iteration. Our discussion of policy evaluation is couched in general terms and aims to unify the available methods in the light of recent research developments and to compare the two main policy evaluation approaches: projected equations and temporal differences (TD), and aggregation. In the context of these approaches, we survey two different types of simulation-based algorithms: matrix inversion methods, such as least-squares temporal difference (LSTD), and iterative methods, such as least-squares policy evaluation (LSPE) and TD (λ), and their scaled variants. We discuss a recent method, based on regression and regularization, which rectifies the unreliability of LSTD for nearly singular projected Bellman equations. An iterative version of this method belongs to the LSPE class of methods and provides the connecting link between LSTD and LSPE. Our discussion of policy improvement focuses on the role of policy oscillation and its effect on performance guarantees. We illustrate that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation, but when done by aggregation it does not. This implies better error bounds and more regular performance for aggregation, at the expense of some loss of generality in cost function representation capability. Hard aggregation provides the connecting link between projected equation/TD-based and aggregation-based policy evaluation, and is characterized by favorable error bounds.National Science Foundation (U.S.) (No.ECCS-0801549)Los Alamos National Laboratory. Information Science and Technology InstituteUnited States. Air Force (No.FA9550-10-1-0412

    Human Demonstrations for Fast and Safe Exploration in Reinforcement Learning

    Full text link
    <p>Reinforcement learning is a promising framework for controlling complex vehicles with a high level of autonomy, since it does not need a dynamic model of the vehicle, and it is able to adapt to changing conditions. When learning from scratch, the performance of a reinforcement learning controller may initially be poor and -for real life applications- unsafe. In this paper the effects of using human demonstrations on the performance of reinforcement learning is investigated, using a combination of offline and online least squares policy iteration. It is found that using the human as an efficient explorer improves learning time and performance for a benchmark reinforcement learning problem. The benefit of the human demonstration is larger for problems where the human can make use of its understanding of the problem to efficiently explore the state space. Applied to a simplified quadrotor slung load drop off problem, the use of human demonstrations reduces the number of crashes during learning. As such, this paper contributes to safer and faster learning for model-free, adaptive control problems.</p

    Value Iteration for Simple Stochastic Games: Stopping Criterion and Learning Algorithm

    Full text link
    Simple stochastic games can be solved by value iteration (VI), which yields a sequence of under-approximations of the value of the game. This sequence is guaranteed to converge to the value only in the limit. Since no stopping criterion is known, this technique does not provide any guarantees on its results. We provide the first stopping criterion for VI on simple stochastic games. It is achieved by additionally computing a convergent sequence of over-approximations of the value, relying on an analysis of the game graph. Consequently, VI becomes an anytime algorithm returning the approximation of the value and the current error bound. As another consequence, we can provide a simulation-based asynchronous VI algorithm, which yields the same guarantees, but without necessarily exploring the whole game graph.Comment: CAV201

    Scenario trees and policy selection for multistage stochastic programming using machine learning

    Full text link
    We propose a hybrid algorithmic strategy for complex stochastic optimization problems, which combines the use of scenario trees from multistage stochastic programming with machine learning techniques for learning a policy in the form of a statistical model, in the context of constrained vector-valued decisions. Such a policy allows one to run out-of-sample simulations over a large number of independent scenarios, and obtain a signal on the quality of the approximation scheme used to solve the multistage stochastic program. We propose to apply this fast simulation technique to choose the best tree from a set of scenario trees. A solution scheme is introduced, where several scenario trees with random branching structure are solved in parallel, and where the tree from which the best policy for the true problem could be learned is ultimately retained. Numerical tests show that excellent trade-offs can be achieved between run times and solution quality

    Robustness of optimal channel reservation using handover prediction in multiservice wireless networks

    Full text link
    The aim of our study is to obtain theoretical limits for the gain that can be expected when using handover prediction and to determine the sensitivity of the system performance against different parameters. We apply an average-reward reinforcement learning approach based on afterstates to the design of optimal admission control policies in mobile multimedia cellular networks where predictive information related to the occurrence of future handovers is available. We consider a type of predictor that labels active mobile terminals in the cell neighborhood a fixed amount of time before handovers are predicted to occur, which we call the anticipation time. The admission controller exploits this information to reserve resources efficiently. We show that there exists an optimum value for the anticipation time at which the highest performance gain is obtained. Although the optimum anticipation time depends on system parameters, we find that its value changes very little when the system parameters vary within a reasonable range. We also find that, in terms of system performance, deploying prediction is always advantageous when compared to a system without prediction, even when the system parameters are estimated with poor precision. © Springer Science+Business Media, LLC 2012.The authors would like to thank the reviewers for their valuable comments that helped to improve the quality of the paper. This work has been supported by the Spanish Ministry of Education and Science and European Comission (30% PGE, 70% FEDER) under projects TIN2008-06739-C04-02 and TIN2010-21378-C02-02 and by Comunidad de Madrid through project S-2009/TIC-1468.Martínez Bauset, J.; Giménez Guzmán, JM.; Pla, V. (2012). Robustness of optimal channel reservation using handover prediction in multiservice wireless networks. Wireless Networks. 18(6):621-633. https://doi.org/10.1007/s11276-012-0423-6S621633186Ji, S., Chen, W., Ding, X., Chen, Y., Zhao, C., & Hu, C. (2010). Potential benefits of GPS/GLONASS/GALILEO integration in an urban canyon–Hong Kong. Journal of Navigation, 63(4), 681–693.Soh, W., & Kim, H. (2006). A predictive bandwidth reservation scheme using mobile positioning and road topology information. IEEE/ACM Transactions on Networking, 14(5), 1078–1091.Kwon, H., Yang, M., Park, A., & Venkatesan, S. (2008). Handover prediction strategy for 3G-WLAN overlay networks. In Proceedings: IEEE network operations and management symposium (NOMS) (pp. 819–822).Huang, C., Shen, H., & Chuang, Y. (2010). An adaptive bandwidth reservation scheme for 4G cellular networks using flexible 2-tier cell structure. Expert Systems with Applications, 37(9), 6414–6420.Wanalertlak, W., Lee, B., Yu, C., Kim, M., Park, S., & Kim, W. (2011). Behavior-based mobility prediction for seamless handoffs in mobile wireless networks. Wireless Networks, 17(3), 645–658.Becvar, Z., Mach, P., & Simak, B. (2011). Improvement of handover prediction in mobile WiMAX by using two thresholds. Computer Networks, 55, 3759–3773.Sgora, A., & Vergados, D. (2009). Handoff prioritization and decision schemes in wireless cellular networks: a survey. IEEE Communications Surveys and Tutorials, 11(4), 57–77.Choi, S., & Shin, K. G. (2002). Adaptive bandwidth reservation and admission control in QoS-sensitive cellular networks. IEEE Transactions on Parallel and Distributed Systems, 13(9), 882–897.Ye, Z., Law, L., Krishnamurthy, S., Xu, Z., Dhirakaosal, S., Tripathi, S., & Molle, M. (2007). Predictive channel reservation for handoff prioritization in wireless cellular networks. Computer Networks, 51(3), 798–822.Abdulova, V., & Aybay, I. (2011). Predictive mobile-oriented channel reservation schemes in wireless cellular networks. Wireless Networks, 17(1), 149–166.Ramjee, R., Nagarajan, R., & Towsley, D. (1997). On optimal call admission control in cellular networks. Wireless Networks, 3(1), 29–41.Bartolini, N. (2001). Handoff and optimal channel assignment in wireless networks. Mobile Networks and Applications, 6(6), 511–524.Bartolini, N., & Chlamtac, I. (2002). Call admission control in wireless multimedia networks. In Proceedings: Personal, indoor and mobile radio communications (PIMRC) (pp. 285–289).Pla, V., & Casares-Giner, V. (2003). Optimal admission control policies in multiservice cellular networks. In Proceedings of the international network optimization conference (INOC) (pp. 466–471).Chu, K., Hung, L., & Lin, F. (2009). Adaptive channel reservation for call admission control to support prioritized soft handoff calls in a cellular CDMA system. Annals of Telecommunications, 64(11), 777–791.El-Alfy, E., & Yao, Y. (2011). Comparing a class of dynamic model-based reinforcement learning schemes for handoff prioritization in mobile communication networks. Expert Systems With Applications, 38(7), 8730–8737.Gimenez-Guzman, J. M., Martinez-Bauset, J., & Pla, V. (2007). A reinforcement learning approach for admission control in mobile multimedia networks with predictive information. IEICE Transactions on Communications , E-90B(7), 1663–1673.Sutton R., Barto A. G. (1998) Reinforcement learning: An introduction. The MIT press, Cambridge, MassachusettsBusoniu, L., Babuska, R., De Schutter, B., & Ernst, D. (2010). Reinforcement learning and dynamic programming using function approximators. Boca Raton, FL: CRC Press.Watkins, C., & Dayan, P. (1992). Q-learning. Machine learning, 8(3–4), 279–292.Brown, T. (2001). Switch packet arbitration via queue-learning. Advances in Neural Information Processing Systems, 14, 1337–1344.Proper, S., & Tadepalli, P. (2006). Scaling model-based average-reward reinforcement learning for product delivery. In Proceedings 17th European conference on machine learning (pp. 735–742).Driessens, K., Ramon, J., & Gärtner, T. (2006). Graph kernels and Gaussian processes for relational reinforcement learning. Machine Learning, 64(1), 91–119.Banerjee, B., & Stone, P. (2007). General game learning using knowledge transfer. In Proceedings 20th international joint conference on artificial intelligence (pp. 672–677).Martinez-Bauset, J., Pla, V., Garcia-Roger, D., Domenech-Benlloch, M. J., & Gimenez-Guzman, J. M. (2008). Designing admission control policies to minimize blocking/forced-termination. In G. Ming, Y. Pan & P. Fan (Eds.), Advances in wireless networks: Performance modelling, analysis and enhancement (pp. 359–390). New York: Nova Science Pub Inc.Biswas, S., & Sengupta, B. (1997). Call admissibility for multirate traffic in wireless ATM networks. In Proceedings IEEE INFOCOM (2, pp. 649–657).Evans, J. S., & Everitt, D. (1999). Effective bandwidth-based admission control for multiservice CDMA cellular networks. IEEE Transactions on Vehicular Technology, 48(1), 36–46.Gilhousen, K., Jacobs, I., Padovani, R., Viterbi, A., Weaver, L. A. J., & Wheatley, C. E., III. (1991). On the capacity of a cellular CDMA system. IEEE Transactions on Vehicular Technology, 40(2), 303–312.Hegde, N., & Altman, E. (2006). Capacity of multiservice WCDMA networks with variable GoS. Wireless Networks, 12, 241–253.Ben-Shimol, Y., Kitroser, I., & Dinitz, Y. (2006). Two-dimensional mapping for wireless OFDMA systems. IEEE Transactions on Broadcasting, 52(3), 388–396.Gao, D., Cai, J., & Ngan, K. N. (2005). Admission control in IEEE 802.11e wireless LANs. IEEE Network, 19(4), 6–13.Liu, T., Bahl, P., & Chlamtac, I. (1998). Mobility modeling, location tracking, and trajectory prediction in wireless ATM networks. IEEE Journal on Selected Areas in Communications, 16(6), 922–936.Hu, F., & Sharma, N. (2004). Priority-determined multiclass handoff scheme with guaranteed mobile qos in wireless multimedia networks. IEEE Transactions on Vehicular Technology, 53(1), 118–135.Chan, J., & Seneviratne, A. (1999). A practical user mobility prediction algorithm for supporting adaptive QoS in wireless networks. In Proceedings IEEE international conference on networks (ICON) (pp. 104–111).Jayasuriya, A., & Asenstorfer, J. (2002). Mobility prediction model for cellular networks based on the observed traffic patterns. In Proceedings of IASTED international conference on wireless and optical communication (WOC) (pp. 386–391).Diederich, J., & Zitterbart, M. (2005). A simple and scalable handoff prioritization scheme. Computer Communications, 28(7), 773–789.Rashad, S., Kantardzic, M., & Kumar, A. (2006). User mobility oriented predictive call admission control and resource reservation for next-generation mobile networks. Journal of Parallel and Distributed Computing, 66(7), 971–988.Soh, W. -S., & Kim, H. (2003). QoS provisioning in cellular networks based on mobility prediction techniques. IEEE Communications Magazine, 41(1), 86 – 92.Lott, M., Siebert, M., Bonjour, S., vonHugo, D., & Weckerle, M. (2004). Interworking of WLAN and 3G systems. Proceedings IEE Communications, 151(5), 507 – 513.Sanabani, M., Shamala, S., Othman, M., & Zukarnain, Z. (2007). An enhanced bandwidth reservation scheme based on road topology information for QoS sensitive multimedia wireless cellular networks. In Proceedings of the 2007 international conference on computational science and its applications—Part II (ICCSA) (pp. 261–274).Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1–3), 159–196.Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.Das, T. K., Gosavi, A., Mahadevan, S., & Marchalleck, N. (1999). Solving semi-markov decision problems using average reward reinforcement learning. Management Science, 45(4), 560–574.Darken, C., Chang, J., & Moody, J. (1992). Learning rate schedules for faster stochastic gradient search. In Proceedings of the IEEE-SP workshop on neural networks for signal processing II. (pp. 3–12)

    On environment difficulty and discriminating power

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10458-014-9257-1This paper presents a way to estimate the difficulty and discriminating power of any task instance. We focus on a very general setting for tasks: interactive (possibly multiagent) environments where an agent acts upon observations and rewards. Instead of analysing the complexity of the environment, the state space or the actions that are performed by the agent, we analyse the performance of a population of agent policies against the task, leading to a distribution that is examined in terms of policy complexity. This distribution is then sliced by the algorithmic complexity of the policy and analysed through several diagrams and indicators. The notion of environment response curve is also introduced, by inverting the performance results into an ability scale. We apply all these concepts, diagrams and indicators to two illustrative problems: a class of agent-populated elementary cellular automata, showing how the difficulty and discriminating power may vary for several environments, and a multiagent system, where agents can become predators or preys, and may need to coordinate. Finally, we discuss how these tools can be applied to characterise (interactive) tasks and (multi-agent) environments. These characterisations can then be used to get more insight about agent performance and to facilitate the development of adaptive tests for the evaluation of agent abilities.I thank the reviewers for their comments, especially those aiming at a clearer connection with the field of multi-agent systems and the suggestion of better approximations for the calculation of the response curves. The implementation of the elementary cellular automata used in the environments is based on the library 'CellularAutomaton' by John Hughes for R [58]. I am grateful to Fernando Soler-Toscano for letting me know about their work [65] on the complexity of 2D objects generated by elementary cellular automata. I would also like to thank David L. Dowe for his comments on a previous version of this paper. This work was supported by the MEC/MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, GVA project PROMETEO/2008/051, the COST - European Cooperation in the field of Scientific and Technical Research IC0801 AT, and the REFRAME project, granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Ministerio de Economia y Competitividad in Spain (PCIN-2013-037).José Hernández-Orallo (2015). On environment difficulty and discriminating power. Autonomous Agents and Multi-Agent Systems. 29(3):402-454. https://doi.org/10.1007/s10458-014-9257-1S402454293Anderson, J., Baltes, J., & Cheng, C. T. (2011). Robotics competitions as benchmarks for ai research. The Knowledge Engineering Review, 26(01), 11–17.Andre, D., & Russell, S. J. (2002). State abstraction for programmable reinforcement learning agents. In Proceedings of the National Conference on Artificial Intelligence (pp. 119–125). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.Antunes, L., Fortnow, L., van Melkebeek, D., & Vinodchandran, N. V. (2006). Computational depth: Concept and applications. Theoretical Computer Science, 354(3), 391–404. Foundations of Computation Theory (FCT 2003), 14th Symposium on Fundamentals of Computation Theory 2003.Arai, K., Kaminka, G. A., Frank, I., & Tanaka-Ishii, K. (2003). Performance competitions as research infrastructure: Large scale comparative studies of multi-agent teams. Autonomous Agents and Multi-Agent Systems, 7(1–2), 121–144.Ashcraft, M. H., Donley, R. D., Halas, M. A., & Vakali, M. (1992). Chapter 8 working memory, automaticity, and problem difficulty. In Jamie I.D. Campbell (Ed.), The nature and origins of mathematical skills, volume 91 of advances in psychology (pp. 301–329). North-Holland.Ay, N., Müller, M., & Szkola, A. (2010). Effective complexity and its relation to logical depth. IEEE Transactions on Information Theory, 56(9), 4593–4607.Barch, D. M., Braver, T. S., Nystrom, L. E., Forman, S. D., Noll, D. C., & Cohen, J. D. (1997). Dissociating working memory from task difficulty in human prefrontal cortex. Neuropsychologia, 35(10), 1373–1380.Bordini, R. H., Hübner, J. F., & Wooldridge, M. (2007). Programming multi-agent systems in AgentSpeak using Jason. London: Wiley. com.Boutilier, C., Reiter, R., Soutchanski, M., Thrun, S. et al. (2000). Decision-theoretic, high-level agent programming in the situation calculus. In Proceedings of the National Conference on Artificial Intelligence (pp. 355–362). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172.Chaitin, G. J. (1977). Algorithmic information theory. IBM Journal of Research and Development, 21, 350–359.Chedid, F. B. (2010). Sophistication and logical depth revisited. In 2010 IEEE/ACS International Conference on Computer Systems and Applications (AICCSA) (pp. 1–4). IEEE.Cheeseman, P., Kanefsky, B. & Taylor, W. M. (1991). Where the really hard problems are. In Proceedings of IJCAI-1991 (pp. 331–337).Dastani, M. (2008). 2APL: A practical agent programming language. Autonomous Agents and Multi-agent Systems, 16(3), 214–248.Delahaye, J. P. & Zenil, H. (2011). Numerical evaluation of algorithmic complexity for short strings: A glance into the innermost structure of randomness. Applied Mathematics and Computation, 219(1), 63–77Dowe, D. L. (2008). Foreword re C. S. Wallace. Computer Journal, 51(5), 523–560. Christopher Stewart WALLACE (1933–2004) memorial special issue.Dowe, D. L., & Hernández-Orallo, J. (2012). IQ tests are not for machines, yet. Intelligence, 40(2), 77–81.Du, D. Z., & Ko, K. I. (2011). Theory of computational complexity (Vol. 58). London: Wiley-Interscience.Elo, A. E. (1978). The rating of chessplayers, past and present (Vol. 3). London: Batsford.Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. London: Lawrence Erlbaum.Fatès, N. & Chevrier, V. (2010). How important are updating schemes in multi-agent systems? an illustration on a multi-turmite model. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1 (pp. 533–540). International Foundation for Autonomous Agents and Multiagent Systems.Ferber, J. & Müller, J. P. (1996). Influences and reaction: A model of situated multiagent systems. In Proceedings of Second International Conference on Multi-Agent Systems (ICMAS-96) (pp. 72–79).Ferrando, P. J. (2009). Difficulty, discrimination, and information indices in the linear factor analysis model for continuous item responses. Applied Psychological Measurement, 33(1), 9–24.Ferrando, P. J. (2012). Assessing the discriminating power of item and test scores in the linear factor-analysis model. Psicológica, 33, 111–139.Gent, I. P., & Walsh, T. (1994). Easy problems are sometimes hard. Artificial Intelligence, 70(1), 335–345.Gershenson, C. & Fernandez, N. (2012). Complexity and information: Measuring emergence, self-organization, and homeostasis at multiple scales. Complexity, 18(2), 29–44.Gruner, S. (2010). Mobile agent systems and cellular automata. Autonomous Agents and Multi-agent Systems, 20(2), 198–233.Hardman, D. K., & Payne, S. J. (1995). Problem difficulty and response format in syllogistic reasoning. The Quarterly Journal of Experimental Psychology, 48(4), 945–975.He, J., Reeves, C., Witt, C., & Yao, X. (2007). A note on problem difficulty measures in black-box optimization: Classification, realizations and predictability. Evolutionary Computation, 15(4), 435–443.Hernández-Orallo, J. (2000). Beyond the turing test. Journal of Logic Language & Information, 9(4), 447–466.Hernández-Orallo, J. (2000). On the computational measurement of intelligence factors. In A. Meystel (Ed.), Performance metrics for intelligent systems workshop (pp. 1–8). Gaithersburg, MD: National Institute of Standards and Technology.Hernández-Orallo, J. (2000). Thesis: Computational measures of information gain and reinforcement in inference processes. AI Communications, 13(1), 49–50.Hernández-Orallo, J. (2010). A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In M. Hutter et al. (Ed.), 3rd International Conference on Artificial General Intelligence (pp. 182–183). Atlantis Press Extended report at http://users.dsic.upv.es/proy/anynt/unbiased.pdf .Hernández-Orallo, J., & Dowe, D. L. (2010). Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence, 174(18), 1508–1539.Hernández-Orallo, J., Dowe, D. L., España-Cubillo, S., Hernández-Lloreda, M. V., & Insa-Cabrera, J. (2011). On more realistic environment distributions for defining, evaluating and developing intelligence. In J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), LNAI series on artificial general intelligence 2011 (Vol. 6830, pp. 82–91). Berlin: Springer.Hernández-Orallo, J., Dowe, D. L., & Hernández-Lloreda, M. V. (2014). Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research, 27, 50–74.Hernández-Orallo, J., Insa, J., Dowe, D. L. & Hibbard, B. (2012). Turing tests with turing machines. In A. Voronkov (Ed.), The Alan Turing Centenary Conference, Turing-100, Manchester, 2012, volume 10 of EPiC Series (pp. 140–156).Hernández-Orallo, J. & Minaya-Collado, N. (1998). A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In Proceedings of International Symposium of Engineering of Intelligent Systems (EIS’98) (pp. 146–163). ICSC Press.Hibbard, B. (2009). Bias and no free lunch in formal measures of intelligence. Journal of Artificial General Intelligence, 1(1), 54–61.Hoos, H. H. (1999). Sat-encodings, search space structure, and local search performance. In 1999 International Joint Conference on Artificial Intelligence (Vol. 16, pp. 296–303).Insa-Cabrera, J., Benacloch-Ayuso, J. L., & Hernández-Orallo, J. (2012). On measuring social intelligence: Experiments on competition and cooperation. In J. Bach, B. Goertzel, & M. Iklé (Eds.), AGI, volume 7716 of lecture notes in computer science (pp. 126–135). Berlin: Springer.Insa-Cabrera, J., Dowe, D. L., España-Cubillo, S., Hernández-Lloreda, M. V., & Hernández-Orallo, J. (2011). Comparing humans and AI agents. In J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), LNAI series on artificial general intelligence 2011 (Vol. 6830, pp. 122–132). Berlin: Springer.Knuth, D. E. (1973). Sorting and searching, volume 3 of the art of computer programming. Reading, MA: Addison-Wesley.Kotovsky, K., & Simon, H. A. (1990). What makes some problems really hard: Explorations in the problem space of difficulty. Cognitive Psychology, 22(2), 143–183.Legg, S. (2008). Machine super intelligence. PhD thesis, Department of Informatics, University of Lugano, June 2008.Legg, S., & Hutter, M. (2007). Universal intelligence: A definition of machine intelligence. Minds and Machines, 17(4), 391–444.Leonetti, M. & Iocchi, L. (2010). Improving the performance of complex agent plans through reinforcement learning. In Proceedings of the 2010 International Conference on Autonomous Agents and Multiagent Systems (Vol. 1, pp. 723–730). International Foundation for Autonomous Agents and Multiagent Systems.Levin, L. A. (1973). Universal sequential search problems. Problems of Information Transmission, 9(3), 265–266.Levin, L. A. (1986). Average case complete problems. SIAM Journal on Computing, 15, 285.Li, M., & Vitányi, P. (2008). An introduction to Kolmogorov complexity and its applications (3rd ed.). Berlin: Springer.Low, C. K., Chen, T. Y., & Rónnquist, R. (1999). Automated test case generation for bdi agents. Autonomous Agents and Multi-agent Systems, 2(4), 311–332.Madden, M. G., & Howley, T. (2004). Transfer of experience between reinforcement learning environments with progressive difficulty. Artificial Intelligence Review, 21(3), 375–398.Mellenbergh, G. J. (1994). Generalized linear item response theory. Psychological Bulletin, 115(2), 300.Michel, F. (2004). Formalisme, outils et éléments méthodologiques pour la modélisation et la simulation multi-agents. PhD thesis, Université des sciences et techniques du Languedoc, Montpellier.Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81.Orponen, P., Ko, K. I., Schöning, U., & Watanabe, O. (1994). Instance complexity. Journal of the ACM (JACM), 41(1), 96–121.Simon, H. A., & Kotovsky, K. (1963). Human acquisition of concepts for sequential patterns. Psychological Review, 70(6), 534.Team, R., et al. (2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Whiteson, S., Tanner, B., & White, A. (2010). The reinforcement learning competitions. The AI Magazine, 31(2), 81–94.Wiering, M., & van Otterlo, M. (Eds.). (2012). Reinforcement learning: State-of-the-art. Berlin: Springer.Wolfram, S. (2002). A new kind of science. Champaign, IL: Wolfram Media.Zatuchna, Z., & Bagnall, A. (2009). Learning mazes with aliasing states: An LCS algorithm with associative perception. Adaptive Behavior, 17(1), 28–57.Zenil, H. (2010). Compression-based investigation of the dynamical properties of cellular automata and other systems. Complex Systems, 19(1), 1–28.Zenil, H. (2011). Une approche expérimentale à la théorie algorithmique de la complexité. PhD thesis, Dissertation in fulfilment of the degree of Doctor in Computer Science, Université de Lille.Zenil, H., Soler-Toscano, F., Delahaye, J. P. & Gauvrit, N. (2012). Two-dimensional kolmogorov complexity and validation of the coding theorem method by compressibility. arXiv, preprint arXiv:1212.6745

    Learning in Networked Interactions: A Replicator Dynamics Approach

    Get PDF
    Abstract. Many real-world scenarios can be modelled as multi-agent systems, where multiple autonomous decision makers interact in a single environment. The complex and dynamic nature of such interactions pre-vents hand-crafting solutions for all possible scenarios, hence learning is crucial. Studying the dynamics of multi-agent learning is imperative in selecting and tuning the right learning algorithm for the task at hand. So far, analysis of these dynamics has been mainly limited to normal form games, or unstructured populations. However, many multi-agent systems are highly structured, complex networks, with agents only interacting lo-cally. Here, we study the dynamics of such networked interactions, using the well-known replicator dynamics of evolutionary game theory as a model for learning. Different learning algorithms are modelled by alter-ing the replicator equations slightly. In particular, we investigate lenience as an enabler for cooperation. Moreover, we show how well-connected, stubborn agents can influence the learning outcome. Finally, we investi-gate the impact of structural network properties on the learning outcome, as well as the influence of mutation driven by exploration
    • …
    corecore